Learning a Robust Consensus Matrix for Clustering Ensemble via Kullback-Leibler Divergence Minimization
نویسندگان
چکیده
Clustering ensemble has emerged as an important extension of the classical clustering problem. It provides a framework for combining multiple base clusterings of a data set to generate a final consensus result. Most existing clustering methods simply combine clustering results without taking into account the noises, which may degrade the clustering performance. In this paper, we propose a novel robust clustering ensemble method. To improve the robustness, we capture the sparse and symmetric errors and integrate them into our robust and consensus framework to learn a low-rank matrix. Since the optimization of the objective function is difficult to solve, we develop a block coordinate descent algorithm which is theoretically guaranteed to converge. Experimental results on real world data sets demonstrate the effectiveness of our method.
منابع مشابه
An Abstract Weighting Framework for Clustering Algorithms
Recent works in unsupervised learning have emphasized the need to understand a new trend in algorithmic design, which is to influence the clustering via weights on the instance points. In this paper, we handle clustering as a constrained minimization of a Bregman divergence. Theoretical results show benefits resembling those of boosting algorithms, and bring new modified weighted versions of cl...
متن کاملA Robust Symmetric Nonnegative Matrix Factorization Framework for Clustering Multiple Heterogeneous Microbiome Data
Integration of multi-view datasets which are comprised of heterogeneous sources or different representations is challenging to understand the subtle and complex relationship in data. Such data integration methods attempt to combine efficiently the complementary information of multiple data types to construct a comprehensive view of underlying data. Nonnegative matrix factorization (NMF), an app...
متن کاملExploiting the Limits of Structure Learning via Inherent Symmetry
This theoretical paper is concerned with the structure learning limit for Gaussian Markov random fields from i.i.d. samples. The common strategy is applying the Fano method to a family of restricted ensembles. The efficiency of this method, however, depends crucially on selected restricted ensembles. To break through this limitation, we analyze the whole graph ensemble from a group theoretical ...
متن کاملAn Introduction to Inference and Learning in Bayesian Networks
Bayesian networks (BNs) are modern tools for modeling phenomena in dynamic and static systems and are used in different subjects such as disease diagnosis, weather forecasting, decision making and clustering. A BN is a graphical-probabilistic model which represents causal relations among random variables and consists of a directed acyclic graph and a set of conditional probabilities. Structure...
متن کاملModel Confidence Set Based on Kullback-Leibler Divergence Distance
Consider the problem of estimating true density, h(.) based upon a random sample X1,…, Xn. In general, h(.)is approximated using an appropriate in some sense, see below) model fƟ(x). This article using Vuong's (1989) test along with a collection of k(> 2) non-nested models constructs a set of appropriate models, say model confidence set, for unknown model h(.).Application of such confide...
متن کامل